Experiments on Candidate Data for Collocation Extraction

نویسندگان

  • Stefan Evert
  • Hannah Kermes
چکیده

The paper describes ongoing work on the evaluation of methods for extracting collocation candidates from large text corpora. Our research is based on a German treebank corpus used as gold standard. Results are available for adjective+noun pairs, which proved to be a comparatively easy extraction task. We plan to extend the evaluation to other types of collocations (e.g., PP+verb pairs).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Synonymy in Collocation Extraction

This paper describes the use of WordNet in a new technique for collocation extraction. The approach is based on restrictions on the possible substitutions for synonyms within candidate phrases. Following a general discussion of collocations and their applications, current extraction methods are briefly described. This is followed by a detailed description of the new approach and results and eva...

متن کامل

Improving effectiveness of mutual information for substantival multiword expression extraction

0957-4174/$ see front matter 2009 Elsevier Ltd. A doi:10.1016/j.eswa.2009.02.026 * Corresponding author. E-mail addresses: [email protected] (W. Zh Yoshida), [email protected] (X. Tang). One of the deficiencies of mutual information is its poor capacity to measure association of words with unsymmetrical co-occurrence, which has large amounts for multi-word expression in texts. Moreover, thre...

متن کامل

Preference Learning in Terminology Extraction: A ROC-based approach

A key data preparation step in Text Mining, Term Extraction selects the terms, or collocation of words, attached to specific concepts. In this paper, the task of extracting relevant collocations is achieved through a supervised learning algorithm, exploiting a few collocations manually labelled as relevant/irrelevant. The candidate terms are described along 13 standard statistical criteria meas...

متن کامل

Multilingual collocation extraction with a syntactic parser

An impressive amount of work was devoted over the past few decades to collocation extraction. The state of the art shows that there is a sustained interest in the morphosyntactic preprocessing of texts in order to better identify candidate expressions; however, the treatment performed is, in most cases, limited (lemmatization, POS-tagging, or shallow parsing). This article presents a collocatio...

متن کامل

Improving Collocation Extraction for High Frequency Words

The purpose of this paper is to introduce an alternative word association measure aimed at addressing the under-extraction collocations that contain high frequency words. While measures such as MI provide the important contribution of filtering out sheer high frequency of words in the detection of collocations in large corpora, one side effect of this filtering is that it becomes correspondingl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003